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3.5 Finite Differences and Fast Poisson Solvers 



It is extremely unusual to use eigenvectors to solve a linear system KU = F. You 
need to know all the eigenvectors of K, and (much more than that) the eigenvector 
matrix S must be especially fast to work with. Both S and S -1 are required, because 
K~ l = SA~ 1 S~ 1 . The eigenvalue matrices A and A -1 are diagonal and quick. When 
the derivatives in Poisson's equation — u xx — u yy = f(x, y) are replaced by second 
differences, we do know the eigenvectors (discrete sines in the columns of S). Then 
the Fourier transform quickly inverts and multiplies by S. 

On a square mesh those differences have —1, 2, —1 in the x-direction and —1, 2, — 1 
in the y-direction (divided by h 2 , where h = meshwidth). Figure 3.20 shows how 
the second differences combine into a u 5-point molecule" for the discrete Laplacian. 
Boundary values u = u (x, y) are assumed to be given along the sides of a unit square. 

The square mesh has A" interior points in each direction (N = 5 in the figure). 
In this case there are n = N 2 = 25 unknown mesh values Uij. When the molecule is 
centered at position the discrete Poisson equation gives a row of K2DU = F: 

K2D U = F Auij -Uij-i -iH-ij -u i+1 j ~u hj+ i = h 2 f(ih,jh) = .(1) 

The inside rows of K2D have five nonzero entries 4, —1, —1, —1, —1. When 
is next to a boundary point of the square, the known value of Uq at that neigh- 
boring boundary point moves to the right side of equation (1). It becomes part of 
the vector F, and a —1 drops out of the corresponding row of K. So i^2D has five 
nonzeros on inside rows and fewer nonzeros on next-to-boundary rows. 
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Figure 3.20: 5-point molecules at inside points (fewer — l's next to boundary). 

This matrix K2D is sparse. Using blocks of size N, we can create the 2D matrix 
from the familiar N by N second difference matrix K. Number the nodes of the 
square a row at a time (this "natural numbering" is not necessarily best). Then the 
—l's for the neighbor above and the neighbor below are A^ positions away from the 
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main diagonal of K2T). The 2D matrix is block tridiagonal with tridiagonal blocks: 

(2) 
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-1 
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K2D 



K + 2I -I 

-I K + 2I -I 



Size N 
Time N 



-I K + 2I 

Size n = N 2 Bandwidth w = N 

Space nw — N 3 Time riw 2 = N 4 



The matrix K2D has 4's down the main diagonal in equation (1). Its bandwidth 
w = N is the distance from the main diagonal to the nonzeros in —I. Many of the 
spaces in between are filled during elimination! This is discussed in Section 6.1. 

Kronecker product One good way to create K2D from K and / is by the kron 

command. When A and B have size N by N, the matrix kron(A, B) is N 2 by N 2 . 
Each number is replaced by the block a^B. 

To take second differences in all rows at the same time, kron(J, K) produces a 
block diagonal matrix of K's. In the ^-direction, kron(ii', I) multiplies —1 and 2 and 
— 1 by J (dealing with a column of meshpoints at a time). Add x and y differences: 



K2B = kron(7, K) + kron(K, I) 
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(3) 



This sum agrees with the 5-point matrix in (2). The computational question is how 
to work with K2D. We will propose three methods: 

1. Elimination in a good order (not using the special structure of K2D) 

2. Fast Poisson Solver (applying the FFT = Fast Fourier Transform) 

3. Odd-Even Reduction (since K2D is block tridiagonal). 

The novelty is in the Fast Poisson Solver, which uses the known eigenvalues and 
eigenvectors of K and K2D. It is strange to solve linear equations KU = F by 
expanding F and U in eigenvectors, but here it is extremely successful. 



Elimination and Fill-in 

For most two-dimensional problems, elimination is the way to go. The matrix from 
a partial differential equation is sparse (like K2D). It is banded but the bandwidth 
is not so small. (Meshpoints cannot be numbered so that all five neighbors in the 
molecule receive nearby numbers.) Figure 3.21 has points 1, . . . , N along the first 
row, then a row at a time going up the square. The neighbors above and below point 
j have numbers j — N and j + N. 
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Ordering by rows produces the — l's in K2D that are N places away from the 
diagonal. The matrix K2D has bandwidth N. The key point is that elimination 
fills in the zeros inside the band. We add row 1 (times ?) to row 2, to eliminate 
the —1 in position (2, 1). But the last —1 in row 1 produces a new — | in row 2. A 
zero inside the band has disappeared. As elimination continues from A to U, virtually 
the whole band is filled in. 

In the end, U has about 5 times 25 nonzeros (this is N 3 , the space needed to store 
U). There will be about N nonzeros next to the pivot when we reach a typical row, 
and N nonzeros below the pivot. Row operations to remove those nonzeros will require 
up to N 2 multiplications, and there are N 2 pivots. So the count of multiplications is 
about 25 times 25 (this is N 4 , for elimination in 2D). 

Figure 3.21: Typical rows of K2D have 5 nonzeros. Elimination fills in the band. 

Section 6.1 will propose a different numbering of the meshpoints, to reduce the 
fill-in that we see in U. This reorders the rows of K2D by a permutation matrix 
P, and the columns by P T . The new matrix P(K2D)P T is still symmetric, but 
elimination (with fill-in) proceeds in a completely different order. The MATLAB 
command symamd(i^) produces a nearly optimal choice of P. 

Elimination is fast in two dimensions (but a Fast Poisson Solver is faster ! ) . In 
three dimensions the matrix size is N 3 and the bandwidth is N 2 . By numbering the 
nodes a plane at a time, vertical neighbors are N 2 nodes apart. The operation count 
for elimination becomes iV 7 , which can be seriously large. Chapter 6 on Solving Large 
Systems will introduce badly needed alternatives to elimination in 3D. 

Solvers Using Eigenvalues 

Our matrices K and K2D are extremely special. We know the eigenvalues and eigen- 
vectors of the second-difference matrix K. The eigenvalues have the special form 
A = 2 — 2 cos 9, and the eigenvectors are discrete sines. There will be a similar pattern 
for K2D, which is formed in a neat way from K (by Kronecker product). The Poisson 
Solver uses those eigenvalues and eigenvectors to solve (fT2D)([/2D) = (F2D). On a 
square mesh it is much faster than elimination. 

Here is the idea, first in one dimension. The matrix K has eigenvalues Ai, . . . , Ajv 
and eigenvectors y%, . . . , y^. There are three steps to the solution of KU = F: 

1. Expand F as a combination F = a±yi + • • • + a^y^ of the eigenvectors 

2. Divide each au by Afc 

3. Recombine eigenvectors into U = (ai/Ai) yi + ■ ■ ■ + (cln/Xn) y^. 
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The success of the method depends on the speed of steps 1 and 3. Step 2 is fast. 

To see that U in step 3 is correct, multiply it by the matrix K. Every eigenvector 
gives Ky = Xy. That cancels the A in each denominator. The result is that KU 
agrees with the vector F in step 1. 

Now look at the calculation required in each step, using matrices. Suppose S is 
the eigenvector matrix, with the eigenvectors yi, . . . , y^ of K in its columns. Then 
the coefficients ai, . . . , come by solving Sa = F: 



Step 1 Solve Sa = F 



Vi 



Vn 



a i 



aw 



am H h (InVn = F . (4) 



Thus a = S l F. Then step 2 divides the a's by the A's to find A 



Ar l S- l F. 



(The eigenvalue matrix A is just the diagonal matrix of A's.) Step 3 uses those 
coefficients a^/Xk in recombining the eigenvectors into the solution vector U: 



Step 3 



U 



yi 



Vn 



SA^a = SA-'S-'F. 



—1 c-l 



(5) 



I AjV 

You see the matrix K~ x = SA _1 S~ 1 appearing in that last formula. We have K~ X F 
because K itself is SAS -1 — the usual diagonalization of a matrix. The eigenvalue 
method is using this SAS -1 factorization instead of K = LU from elimination. 

The speed of steps 1 and 3 depends on multiplying quickly by S _1 and S. Those 
are full matrices, not sparse like K. Normally they both need A^ 2 operations in one 
dimension (where the matrix size is N). But the "sine eigenvectors" in S give the 
Discrete Sine Transform, and the Fast Fourier Transform executes S and S -1 in 
A^ log 2 A^ steps. In one dimension this is not as fast as cN from elimination, but in 
two dimensions A^ 2 log(A r2 ) easily wins. 

Column k of the matrix S contains the eigenvector y^- The number Sjk = sin -^y 
is the jth component of that eigenvector. For our example with N = 5 and A^+ 1 = 6, 
you could list the sines of every multiple of tt/6. Here are those numbers: 

i I 

2' 2 ' ' 2 ' 2'' 

The fcth column of S (kth eigenvector y^) takes every fcth number from that list: 
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Those eigenvectors are orthogonal ! This is guaranteed by the symmetry of K. All 
these eigenvectors have length 3 = (N + l)/2. Dividing each column by a/3, we have 
orthonormal eigenvectors. S/y/3 is an orthogonal matrix Q, with Q T Q = I. 
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In this special case, S and Q are also symmetric. So Q 1 = Q T = Q. 

Notice that y k has k — 1 changes of sign. It comes from k loops of the sine curve. 
The eigenvalues are increasing: A = 2 - V3, 2-1, 2-0, 2 + 1, 2 + V3. Those 
eigenvalues add to 10, which is the sum down the diagonal (the trace) of K 5 . The 
product of the 5 eigenvalues (easiest by pairs) confirms that det^s) = 6. 



Fast Poisson Solvers 



To extend this eigenvalue method to two dimensions, we need the eigenvalues and 
eigenvectors of K2D. The key point is that the iV 2 eigenvectors of K2D are separable. 
Each eigenvector y^i separates into a product of sines: 

ikn jln 

Eigenvectors y k i The component of y k i is sm sm ^ ^ ■ (6) 

When you multiply that eigenvector by i^2D, you have second differences in the re- 
direction and ^-direction. The second differences of the first sine (x-direction) produce 
a factor \ k = 2 — 2 cos j^tj- This is the eigenvalue of K in ID. The second differences 
of the other sine (^-direction) produce a factor A/ = 2 — 2 cos So the eigenvalue 
in two dimensions is the sum \ k + A; of one-dimensional eigenvalues: 



(K2B)y kl = X kl y kl X kl = (2 - 2 cos ^) + (2 - 2 cos ^) . 

Now the solution of K2D U = F comes by a two-dimensional sine transform: 
FiJ = £ £ a*i sm j^pj sm — U hJ = £ £ — sm ^ sm — ( 

Again we find the a's, divide by the A's, and build U from the eigenvectors in S: 



(7) 



Step 1 a = S^F Step 2 A _1 a = A^S^F Step 3 U = SA^S^F 



Swartztrauber [SIAM Review 19 (1977) 490] gives the operation count 2N 2 \og 2 N. 
This uses the Fast Sine Transform (based on the FFT) to multiply by S^ 1 and S. 
The Fast Fourier Transform is explained in Section 4.3. 

Note We take this chance to notice the good properties of a Kronecker product 
C = kxon(A, B). Suppose A and B have their eigenvectors in the columns of Sa and 
Sb- The eigenvalues are in A^ and A#. Then we know Sc and A^: 

The eigenvectors o/kron(A, B) are in ~kron(SA<> Sb) • 
The eigenvalues are in kron(A^, Ab). 

The diagonal blocks in kron(A J 4, As) are entries \k{A) times the diagonal matrix A#. 
So the eigenvalues \ k i of C are just the products \ k (A)\i(B). 
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In our case A and B were I and K. The matrix K2D added the two products 
kron(J, K) and kron(if, I). Normally we cannot know the eigenvectors and eigen- 
values of a matrix sum — except when the matrices commute. Since all our matrices 
are formed from K, these Kronecker products do commute. This gives the separable 
eigenvectors and eigenvalues in (6) and (7). 



Cyclic Odd-Even Reduction 

There is an entirely different (and very simple) approach to KU = F. I will start in 
one dimension, by writing down three rows of the usual second difference equation: 

Row i - 1 -Ui-2 + 2C/ i _ 1 - Ui = Ft-i 

Row % -Ui-\ + 2Ui - U i+1 = Ft (10) 

Row i + 1 -Ui + 2U i+1 - U i+2 = F i+1 

Multiply the middle equation by 2, and add. This eliminates C/j_i and Ui + i\ 

Odd-even reduction in ID -U i-2 + 2U { - U l+2 = + 2F l + F l+l . (11) 

Now we have a half-size system, involving only half of the ZP& (with even indices). 
The new system (11) has the same tridiagonal form as before. When we repeat, cyclic 
reduction produces a quarter-size system. Eventually we can reduce KU = F to a 
very small problem, and then cyclic back-substitution produces the whole solution. 

How does this look in two dimensions ? The big matrix i^2D is block triangular: 



K2B 



A -I 
-I A 



-I A 



with A = K + 2I from equation (4) . (12) 



The three equations in (10) become block equations for whole rows of N mesh values. 
We are taking the unknowns U, = (Un, . . . , U^) a row at a time. If we write three 
rows of (10), the block A replaces the number 2 in the scalar equation. The block 
—I replaces the number —1. To reduce (K2D)(U2D) = (E2D) to a half-size system, 
multiply the middle equation (with i even) by A and add the three block equations: 

Reduction in 2D -/U z _ 2 + (A 2 - 2/)U z - IV l+2 = F,_i + AF, + F i+1 . (13) 

The new half-size matrix is still block tridiagonal. The diagonal blocks that were 
previously A in (12) are now A 2 - 21, with the same eigenvectors. The unknowns are 
the |iV 2 values Uij at meshpoints with even indices i. 

The bad point is that each new diagonal block A 2 — 21 has five diagonals, where 
the original block A = K + 2I had three diagonals. This bad point gets worse as cyclic 
reduction continues. At step r, the diagonal blocks become A r = A 2 _ 1 — 21 and their 
bandwidth doubles. We could find tridiagonal factors (A - y/2I) (A + \p2I) = A 2 - 21, 
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but the number of factors grows quickly. Storage and computation and roundoff error 
are increasing too rapidly with more reduction steps. 

Stable variants of cyclic reduction were developed by Buneman and Hockney. The 
clear explanation by Buzbee, Golub, and Nielson [SIAM Journal of Numerical Anal- 
ysis 7 (1970) 627] allows other boundary conditions and other separable equations, 
coming from polar coordinates or convection terms like Cdu/dx + Ddu/dy. After m 
steps of cyclic reduction, Hockney went back to a Fast Poisson Solver. 

This combination FACR(m) is widely used, and the optimal number of cyclic 
reduction steps (before the FFT takes over) is small. For N = 128 a frequent choice 
is m = 2. Asymptotically m pt grows like log log N and the operation count for 
FACR(m op t) is 3iV 2 log log N. In practical scientific computing with N 2 unknowns 
(or with N 3 unknowns in three dimensions), a Fast Poisson Solver is a winner. 

****** Add code for Fast Poisson ******* 

Problem Set 3.5 
Problems 1— are for readers who get enthusiastic about kron. 

1 Why is the transpose of C = kron(v4, B) equal to kron(A T , B T ) ? Why is the 
inverse equal to C^ 1 = kron(A _1 , B^ 1 ) ? You have to transpose each block a^B 
of the Kronecker product C, and then patiently multiply CC~ X by blocks. 

C is symmetric (or orthogonal) when A and B are symmetric (or orthogonal). 

2 Why is the matrix C = kron(A, B) times the matrix D = kron(S', T) equal to 
CD = kron(ylS', BT) ? This needs even more patience with block multiplica- 
tion. The inverse CC~ l = kron(J, I) = I is a special case. 

Note Suppose S and T are eigenvector matrices for A and B. From AS = 
SA A and BT = TA B we have CD = kron(AS, BT) = kron(SA A ,TA B ). Then 
CD = D kron(A j 4, A#) = DAq- So D = kron(S', T) is the eigenvector matrix 
for C. 



